PCA Explained. Principal Component Analysis(PCA) is an… 您所在的位置:网站首页 principal component analysis pca with scikit PCA Explained. Principal Component Analysis(PCA) is an…

PCA Explained. Principal Component Analysis(PCA) is an…

2023-03-26 17:06| 来源: 网络整理| 查看: 265

PCA Explained

Principal Component Analysis(PCA) is an essential algorithm in a data scientist's toolkit. It is used to reduce the dimensionality of a dataset while retaining as much of the original information as possible. This makes it particularly useful for analyzing large datasets with many variables, where it can be difficult to visualize and interpret the data. We will understand its uses, working, and implementation. We will also discuss some common pitfalls and limitations of PCA and how to overcome them.

Why We Use PCA

One of the top uses of PCA is visualization. To show why, we need to see the various representations of data.

The image above is one-dimensional data. We can represent it in a line and discriminate between apples, tomatoes and bananas.import plotly.express as pxdf = px.data.iris() # iris is a pandas DataFramefig = px.scatter(df, x="sepal_width", y="sepal_length",color ="species")fig.show()The image above is a 2D scatter plot representing the two variables sepal_length and sepal_width. We can see how we can discriminate between the various species using the two columns.import plotly.express as pxdf = px.data.iris() # iris is a pandas DataFramefig = px.scatter_3d(df, x="sepal_width", y="sepal_length",z='petal_width',color ="species")fig.show()

And finally, we can represent and classify the data using 3D plots. However, this is the hard limit of data visualisation since humans can’t see dimensions above four, so we can use Principal Component Analysis to pick the best components(1,2,3) to represent the data.

Mathematical Explanation of PCA

It’s important to understand the mathematics behind an Algorithm to understand the usability of the dataset. Unfortunately, this concept is complex and can’t be explained as well using the written word alone. I find the channel StatQuest is able to explain this concept quite well in the following video.

PCA step-by-step

Implementing PCA with Python

We will use scikit-learn’s library to make a component analysis of our 4-dimensional data.

I will be using a stroke prediction dataset that can be found on Aishwarya Ramakrishnan’s gist repository. The full notebook can be found on my Github Page and my collab notebook.

# Loading the dependencies we will be usingimport pandas as pdimport matplotlib.pyplot as pltimport plotly.express as pximport seaborn as sns# Loading the Dataset into memoryurl = "https://gist.githubusercontent.com/aishwarya8615/d2107f828d3f904839cbcb7eaa85bd04/raw/cec0340503d82d270821e03254993b6dede60afb/healthcare-dataset-stroke-data.csv"df = pd.read_csv(url,index_col = 0)df

After loading the Dataset, we will do some preprocessing and EDA, which can be found in my notebook and then proceed to Principal Component Analysis.

from sklearn.decomposition import PCApca = PCA(n_components = 3)# These are the variabls that we will be using for the PCAcols = ["age", "avg_glucose_level","bmi","hypertension", "heart_disease"]pca.fit(df[["age", "avg_glucose_level","bmi","hypertension", "heart_disease"]])

Array = pd.DataFrame(pca.fit_transform(df[["age", "avg_glucose_level","bmi","hypertension", "heart_disease"]]).tolist(),columns = ["PC1","PC2","PC3"])Array.info()

Array["stroke"] = list(df["stroke"])

px.scatter_3d(Array,x = "PC1" , y= "PC2" ,z = "PC3" ,color = "stroke")

Although faint, one can clearly see a linear separation in the data at the 0 of the x-axis. This shows the data will likely be classified using linear algorithms.Conclusion

Principal Component Analysis (PCA) is a powerful technique in data analysis and machine learning that can help us identify the underlying patterns in complex datasets. Mastering PCA can greatly enhance your ability to extract insights and make informed decisions from complex data, making it an essential skill for any data scientist or machine learning practitioner.

BECOME a WRITER at MLearning.aiMlearning.ai Submission SuggestionsHow to become a writer on Mlearning.ai

medium.com



【本文地址】

公司简介

联系我们

今日新闻

    推荐新闻

    专题文章
      CopyRight 2018-2019 实验室设备网 版权所有